AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
End-to-end speech model

# End-to-end speech model

Voila Chat
MIT
Voila is a brand-new large-scale speech-language foundation model series designed to elevate human-computer interaction to unprecedented levels.
Text-to-Audio Transformers Supports Multiple Languages
V
maitrix-org
2,423
32
Llama3.1 Typhoon2 Audio 8b Instruct
Typhoon 2-Audio Edition is an end-to-end speech-to-speech model architecture capable of processing audio, speech, and text inputs while simultaneously generating both text and speech outputs. The model is specifically optimized for Thai language while also supporting English.
Text-to-Audio Transformers Supports Multiple Languages
L
scb10x
664
9
Flow Mirror
Apache-2.0
FlowMirror is an end-to-end speech model developed by Zhejiang Jingzhunxue AI Lab, supporting tasks such as voice dialogue, ASR, and TTS, with a focus on educational applications
Text-to-Audio Transformers
F
jzx-ai-lab
21
2
Mms Tts Vie
Vietnamese text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis
Speech Synthesis Transformers
M
facebook
3,616
27
W2v Timit Ft 4001
A speech recognition model based on Wav2Vec 2.0 architecture, fine-tuned on the TIMIT dataset, suitable for English speech-to-text tasks
Speech Recognition Transformers
W
devin132
22
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase